This notebook is better viewed rendered as slides. You can convert it to slides and view them by:
$ ipython nbconvert --to slides --post serve <this-notebook-name.ipynb>
This and other related IPython notebooks can be found at the course github repository:
Note: In case you are -still- wondering, a maximization problem can be posed as the minimization one: $\min\ -\vec{F}(\vec{x})$.
Usually, does not exists a unique solution that minimizes all objective functions simultaneously, but, instead, a set of equally good trade-off solutions.
As usual, we need some initialization and configuration.
In [1]:
import time, array, random, copy, math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn
seaborn.set(style="whitegrid")
from deap import algorithms, base, benchmarks, tools, creator
Planting a constant seed to always have the same results (and avoid surprises in class). -you should not do this in a real-world case!
In [2]:
random.seed(a=42)
In [3]:
creator.create("FitnessMin", base.Fitness, weights=(-1.0,-1.0))
creator.create("Individual", array.array, typecode='d',
fitness=creator.FitnessMin)
Preparing a DEAP toolbox
with DTLZ2.
In [4]:
toolbox = base.Toolbox()
In [5]:
BOUND_LOW, BOUND_UP = 0.0, 1.0
NDIM = 2
toolbox.register("evaluate", lambda ind: benchmarks.dtlz2(ind, 2))
Defining components of the evolutionary algorithm.
In [6]:
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
Create an example random population and evaluate the individuals.
In [7]:
num_samples = 50
limits = [np.arange(BOUND_LOW, BOUND_UP, (BOUND_UP - BOUND_LOW)/num_samples)] * NDIM
sample_x = np.meshgrid(*limits)
In [8]:
flat = []
for i in range(len(sample_x)):
x_i = sample_x[i]
flat.append(x_i.reshape(num_samples ** NDIM))
In [9]:
example_pop = toolbox.population(n=num_samples ** NDIM)
In [10]:
for i, ind in enumerate(example_pop):
for j in range(len(flat)):
ind[j] = flat[j][i]
In [11]:
fitnesses = toolbox.map(toolbox.evaluate, example_pop)
for ind, fit in zip(example_pop, fitnesses):
ind.fitness.values = fit
We also need a_given_individual
.
In [12]:
a_given_individual = toolbox.population(n=1)[0]
a_given_individual[0] = 0.3
a_given_individual[1] = 0.2
In [13]:
a_given_individual.fitness.values = toolbox.evaluate(a_given_individual)
DEAP comes with a Pareto dominance relation that we can use.
In [14]:
def dominates(x,y):
return tools.emo.isDominated(x.fitness.values, y.fitness.values)
Lets compute the set of individuals that are dominated
by a_given_individual
, the ones that dominate it (its dominators
) and the remaining ones.
In [15]:
dominated = [ind for ind in example_pop if dominates(a_given_individual, ind)]
dominators = [ind for ind in example_pop if dominates(ind, a_given_individual)]
others = [ind for ind in example_pop if not ind in dominated and not ind in dominators]
For a_given_individual
(blue dot) we can now plot those that are dominated by it (in green), those that dominate it (in red) and those that are uncomparable.
In [16]:
plt.figure(figsize=(5,5))
for ind in dominators:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'r.', alpha=0.5)
for ind in dominated:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'g.', alpha=0.5)
for ind in others:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', ms=4)
plt.plot(a_given_individual.fitness.values[0], a_given_individual.fitness.values[1], 'bo', ms=11);
What about decision space?
In [17]:
plt.figure(figsize=(5,5))
for ind in dominators:
plt.plot(ind[0], ind[1], 'r.', alpha=0.5)
for ind in dominated:
plt.plot(ind[0], ind[1], 'g.', alpha=0.5)
for ind in others:
plt.plot(ind[0], ind[1], 'k.', ms=4)
plt.plot(a_given_individual[0], a_given_individual[1], 'bo', ms=11);
Obtain the nondominated front
In [18]:
non_dom = tools.sortNondominated(example_pop, k=len(example_pop), first_front_only=True)[0]
In [19]:
plt.figure(figsize=(5,5))
for ind in example_pop:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'k.', ms=4)
for ind in non_dom:
plt.plot(ind.fitness.values[0], ind.fitness.values[1], 'ro', alpha=0.74, ms=7)
Each of these categories store individuals that are only dominated by the elements of the previous categories, $$ \begin{array}{rl} \forall \vec{x}\in\set{F}i: &\exists \vec{y}\in\set{F}{i-1} \text{ such that } \vec{y}\dom\vec{x},\text{ and }\
&\not\exists\vec{z}\in \set{P}_t\setminus\left( \set{F}_1\cup\ldots\cup\set{F}_{i-1}
\right)\text{ that }\vec{z}\dom\vec{x}\,;
\end{array} $$ with $\mathcal{F}_1$ equal to $\mathcal{P}_t^\ast$, the set of non-dominated individuals of $\mathcal{P}_t$.
After all individuals are ranked a local crowding distance is assigned to them.
The assignment process goes as follows,
{f_m\left(\vec{x}_{I_{1}}\right)-f_m\left(\vec{x}_{I_{f_l}}\right)}\,.
$$Here the $\mathrm{sort}\left(\set{F},m\right)$ function produces an ordered index vector $\vec{I}$ with respect to objective function $m$.
Sorting the population by rank and distance.
Now we have key element of the the non-dominated sorting GA.
In [14]:
toolbox = base.Toolbox()
Let's try DTLZ3, a more complex problem that DTLZ2 as it has many local Pareto-optimal fronts that run parallel to global one.
In [15]:
toolbox.register("evaluate", lambda ind: benchmarks.dtlz3(ind, 2))
In [16]:
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
In [17]:
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
Let's also use the toolbox
to pass other configuration to our function. This will show itself usefull when performing massive experiments.
In [18]:
toolbox.pop_size = 50
toolbox.max_gen = 3000
toolbox.mut_prob = 0.2
In [19]:
def nsga_ii(toolbox, stats=None, verbose=False):
pop = toolbox.population(n=toolbox.pop_size)
pop = toolbox.select(pop, len(pop))
return algorithms.eaMuPlusLambda(pop, toolbox, mu=toolbox.pop_size,
lambda_=toolbox.pop_size,
cxpb=1-toolbox.mut_prob,
mutpb=toolbox.mut_prob,
stats=stats,
ngen=toolbox.max_gen,
verbose=verbose)
We are now ready to run our NSGA-II.
In [20]:
%time res, logbook = nsga_ii(toolbox)
We can now get the Pareto fronts in the results (res
).
In [21]:
fronts = tools.emo.sortLogNondominated(res, len(res))
In [22]:
plot_colors = ('b','r', 'g', 'm', 'y', 'k', 'c')
fig, ax = plt.subplots(1, figsize=(4,4))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y=df.columns[1],
color=plot_colors[i % len(plot_colors)])
plt.xlabel('$f_1(\mathbf{x})$');plt.ylabel('$f_2(\mathbf{x})$');
We create a stats
to store the individuals not only their objective function values.
In [23]:
stats = tools.Statistics()
stats.register("pop", copy.deepcopy)
In [24]:
toolbox.max_gen = 4000 # we need more generations!
Re-run the algorithm to get the data necessary for plotting.
In [25]:
%time res, logbook = nsga_ii(toolbox, stats=stats)
In [26]:
from JSAnimation import IPython_display
import matplotlib.colors as colors
from matplotlib import animation
In [27]:
def animate(frame_index, logbook):
'Updates all plots to match frame _i_ of the animation.'
ax.clear()
fronts = tools.emo.sortLogNondominated(logbook.select('pop')[frame_index],
len(logbook.select('pop')[frame_index]))
for i,inds in enumerate(fronts):
par = [toolbox.evaluate(ind) for ind in inds]
df = pd.DataFrame(par)
df.plot(ax=ax, kind='scatter', label='Front ' + str(i+1),
x=df.columns[0], y =df.columns[1], alpha=0.47,
color=plot_colors[i % len(plot_colors)])
ax.set_title('$t=$' + str(frame_index))
ax.set_xlabel('$f_1(\mathbf{x})$');ax.set_ylabel('$f_2(\mathbf{x})$')
return None
In [28]:
fig = plt.figure(figsize=(4,4))
ax = fig.gca()
anim = animation.FuncAnimation(fig, lambda i: animate(i, logbook),
frames=len(logbook), interval=60,
blit=True)
anim
The previous animation makes the notebook too big for on-line viewing. To circumvent this, it is better to save the animation as video and upload it to YouTube.
In [29]:
anim.save('nsgaii-dtlz3.mp4', fps=15, bitrate=-1, dpi=500)
In [30]:
from IPython.display import YouTubeVideo
YouTubeVideo('Cm7r4cJq59s')
Out[30]:
Here it is clearly visible how the algorithm "jumps" from one local-optimum to a better one as evolution takes place.
Each problem instance is meant to test the algorithms with regard with a given feature: local optima, convexity, discontinuity, bias, or a combination of them.
DEAP still lacks some of these problems so I will grab the DTLZ5, DTLZ6 and DTLZ7 problems from inspyred and adapt them for DEAP.
Note to self: sent these problems to the DEAP guys.
DTLZ5 and DTLZ6 have an $m-1$-dimensional Pareto-optimal front.
In [31]:
def dtlz5(ind, n_objs):
from functools import reduce
g = lambda x: sum([(a - 0.5)**2 for a in x])
gval = g(ind[n_objs-1:])
theta = lambda x: math.pi / (4.0 * (1 + gval)) * (1 + 2 * gval * x)
fit = [(1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:]])]
for m in reversed(range(1, n_objs)):
if m == 1:
fit.append((1 + gval) * math.sin(math.pi / 2.0 * ind[0]))
else:
fit.append((1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:m-1]], 1) *
math.sin(theta(ind[m-1])))
return fit
In [32]:
def dtlz6(ind, n_objs):
from functools import reduce
gval = sum([a**0.1 for a in ind[n_objs-1:]])
theta = lambda x: math.pi / (4.0 * (1 + gval)) * (1 + 2 * gval * x)
fit = [(1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:]])]
for m in reversed(range(1, n_objs)):
if m == 1:
fit.append((1 + gval) * math.sin(math.pi / 2.0 * ind[0]))
else:
fit.append((1 + gval) * math.cos(math.pi / 2.0 * ind[0]) *
reduce(lambda x,y: x*y, [math.cos(theta(a)) for a in ind[1:m-1]], 1) *
math.sin(theta(ind[m-1])))
return fit
DTLZ7 has many disconnected Pareto-optimal fronts.
In [33]:
def dtlz7(ind, n_objs):
gval = 1 + 9.0 / len(ind[n_objs-1:]) * sum([a for a in ind[n_objs-1:]])
fit = [ind for ind in ind[:n_objs-1]]
fit.append((1 + gval) * (n_objs - sum([a / (1.0 + gval) * (1 + math.sin(3 * math.pi * a)) for a in ind[:n_objs-1]])))
return fit
How does our NSGA-II behaves when faced with different benchmark problems?
In [34]:
problem_instances = {'ZDT1': benchmarks.zdt1, 'ZDT2': benchmarks.zdt2,
'ZDT3': benchmarks.zdt3, 'ZDT4': benchmarks.zdt4,
'DTLZ1': lambda ind: benchmarks.dtlz1(ind,2),
'DTLZ2': lambda ind: benchmarks.dtlz2(ind,2),
'DTLZ3': lambda ind: benchmarks.dtlz3(ind,2),
'DTLZ4': lambda ind: benchmarks.dtlz4(ind,2, 100),
'DTLZ5': lambda ind: dtlz5(ind,2),
'DTLZ6': lambda ind: dtlz6(ind,2),
'DTLZ7': lambda ind: dtlz7(ind,2)}
In [35]:
toolbox.max_gen = 1000
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("obj_vals", np.copy)
def run_problem(toolbox, problem):
toolbox.register('evaluate', problem)
return nsga_ii(toolbox, stats=stats)
Running NSGA-II solving all problems. Now it takes longer.
In [36]:
%time results = {problem: run_problem(toolbox, problem_instances[problem]) for problem in problem_instances}
Creating this animation takes more programming effort.
In [37]:
class MultiProblemAnimation:
def init(self, fig, results):
self.results = results
self.axs = [fig.add_subplot(3,4,i+1) for i in range(len(results))]
self.plots =[]
for i, problem in enumerate(sorted(results)):
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[0])
plot = self.axs[i].plot(pop[0], pop[1], 'b.', alpha=0.47)[0]
self.plots.append(plot)
fig.tight_layout()
def animate(self, t):
'Updates all plots to match frame _i_ of the animation.'
for i, problem in enumerate(sorted(results)):
#self.axs[i].clear()
(res, logbook) = self.results[problem]
pop = pd.DataFrame(data=logbook.select('obj_vals')[t])
self.plots[i].set_data(pop[0], pop[1])
self.axs[i].set_title(problem + '; $t=' + str(t)+'$')
self.axs[i].set_xlim((0, max(1,pop.max()[0])))
self.axs[i].set_ylim((0, max(1,pop.max()[1])))
return self.axs
In [38]:
mpa = MultiProblemAnimation()
In [39]:
fig = plt.figure(figsize=(14,6))
anim = animation.FuncAnimation(fig, mpa.animate, init_func=mpa.init(fig,results),
frames=toolbox.max_gen, interval=60, blit=True)
anim
Saving the animation as video and uploading it to youtube.
In [40]:
anim.save('nsgaii-benchmarks.mp4', fps=15, bitrate=-1, dpi=500)
In [41]:
YouTubeVideo('8t-aWcpDH0U')
Out[41]:
Read more about this in the class materials.
Evolutionary algorithms are stochastic algorithms, therefore their results must be assessed by repeating experiments until you reach an statistically valid conclusion.
As usual we need to establish some notation:
toolbox
instances to define experiments. We start by creating a toolbox
that will contain the configuration that will be shared across all experiments.
In [42]:
toolbox = base.Toolbox()
In [43]:
BOUND_LOW, BOUND_UP = 0.0, 1.0
NDIM = 30
# the explanation of this... a few lines bellow
def eval_helper(ind):
return benchmarks.dtlz3(ind, 2)
toolbox.register("evaluate", eval_helper)
In [44]:
def uniform(low, up, size=None):
try:
return [random.uniform(a, b) for a, b in zip(low, up)]
except TypeError:
return [random.uniform(a, b) for a, b in zip([low] * size, [up] * size)]
toolbox.register("attr_float", uniform, BOUND_LOW, BOUND_UP, NDIM)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("mate", tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0)
toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/NDIM)
toolbox.register("select", tools.selNSGA2)
toolbox.pop_size = 200
toolbox.max_gen = 500
We add a experiment_name
to toolbox
that we will fill up later on.
In [45]:
toolbox.experiment_name = "$P_\mathrm{mut}="
We can now replicate this toolbox instance and then modify the mutation probabilities.
In [46]:
mut_probs = (0.05, 0.15, 0.3)
number_of_experiments = len(mut_probs)
toolboxes=list([copy.copy(toolbox) for _ in range(number_of_experiments)])
Now toolboxes
is a list of copies of the same toolbox. One for each experiment configuration (population size).
...but we still have to set the population sizes in the elements of toolboxes
.
In [47]:
for i, toolbox in enumerate(toolboxes):
toolbox.mut_prob = mut_probs[i]
toolbox.experiment_name = toolbox.experiment_name + str(mut_probs[i]) +'$'
In [48]:
for toolbox in toolboxes:
print(toolbox.experiment_name, toolbox.mut_prob)
As we are dealing with stochastic methods their results should be reported relying on an statistical analysis.
toolbox
instance in our case) should be repeated a sufficient amount of times. Note: I personally like the number 42 as it is the answer to The Ultimate Question of Life, the Universe, and Everything.
In [49]:
number_of_runs = 42
As we are now solving more demanding problems it would be nice to make our algorithms to run in parallel and profit from modern multi-core CPUs.
map()
function throu the toolbox
.multiprocessing
or concurrent.futures
modules.Note: While preparing this notebook I why process-based parallelization is much faster than thread-based parallelization. You can have a very good summary about this in http://blog.liang2.tw/2014-handy-dist-computing/.
In [50]:
from IPython.html import widgets
from IPython.display import display
In [51]:
progress_bar = widgets.IntProgressWidget(description="Starting...",
max=len(toolboxes)*number_of_runs)
Process-based parallelization based on multiprocessing
requires that the parameters passed to map()
be pickleable.
lambda
functions can not be directly used. lambda
fans out there! -me included.
In [52]:
def run_algo_wrapper(toolboox):
result,a = nsga_ii(toolbox)
pareto_sets = tools.emo.sortLogNondominated(result, len(result))
return pareto_sets[0]
%%time
import concurrent.futures
display(progress_bar)
results = {toolbox.experiment_name:[] for toolbox in toolboxes}
with concurrent.futures.ProcessPoolExecutor() as executor:
# Submit all the tasks...
futures = {executor.submit(run_algo_wrapper, toolbox): toolbox
for _ in range(number_of_runs)
for toolbox in toolboxes}
# ...and wait for them to finish.
for future in concurrent.futures.as_completed(futures):
tb = futures[future]
results[tb.experiment_name].append(future.result())
progress_bar.value +=1
progress_bar.description = "Finished %03d of %03d:" % (progress_bar.value, progress_bar.max)
This is not a perfect implementation but... works!
In [53]:
%%time
from multiprocessing import Pool
display(progress_bar)
results = {}
pool = Pool()
for toolbox in toolboxes:
results[toolbox.experiment_name] = pool.map(run_algo_wrapper, [toolbox] * number_of_runs)
progress_bar.value +=number_of_runs
progress_bar.description = "Finished %03d of %03d:" % (progress_bar.value, progress_bar.max)
As you can see, even this relatively small experiment took lots of time!
As running the experiments takes so long, lets save the results so we can use them whenever we want.
In [54]:
import pickle
In [55]:
pickle.dump(results, open('nsga_ii_dtlz3-results.pickle', 'wb'),
pickle.HIGHEST_PROTOCOL)
In case you need it, this file is included in the github repository.
To load the results we would just have to:
In [56]:
loaded_results = pickle.load(open('nsga_ii_dtlz3-results.pickle', 'rb'))
#results = loaded_results # <-- uncomment if needed
results
is a dictionary, but a pandas DataFrame
is a more handy container for the results.
In [57]:
res = pd.DataFrame(loaded_results)
Headers may come out unsorted. Let's fix that first.
In [58]:
res.head()
Out[58]:
In [59]:
res = res.reindex_axis([toolbox.experiment_name for toolbox in toolboxes], axis=1)
In [60]:
res.head()
Out[60]:
In [61]:
a = res.applymap(lambda pop: [toolbox.evaluate(ind) for ind in pop])
plt.figure(figsize=(15,3))
for i, col in enumerate(a.columns):
plt.subplot(1,len(a.columns), i+1)
for pop in a[col]:
x = pd.DataFrame(data= pop)
plt.scatter(x[0], x[1], alpha=0.5)
plt.title(col)
The local Pareto-optimal fronts are clearly visible!
Calculating nadir point: an individual that is “worse” than any other individual in each of the objectives.
In [62]:
def calculate_nadir(results):
alldata = np.concatenate(np.concatenate(results.values))
obj_vals = [toolbox.evaluate(ind) for ind in alldata]
return np.max(obj_vals, axis=0)
In [63]:
nadir = calculate_nadir(res)
In [64]:
nadir
Out[64]:
We can now compute the hypervolume of the Pareto-optimal fronts yielded by each algorithm run.
In [65]:
import deap.benchmarks.tools as bt
In [66]:
hypervols = res.applymap(lambda pop: bt.hypervolume(pop, nadir))
In [67]:
hypervols.head()
Out[67]:
In [68]:
hypervols.describe()
Out[68]:
In [69]:
fig = plt.figure(figsize=(15,3))
plt.subplot(1,2,1, title='Violin plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.violinplot(hypervols, alpha=0.74)
plt.ylabel('Hypervolume'); plt.xlabel('Mutation probabilities')
plt.subplot(1,2,2, title='Box plots of NSGA-II with $P_{\mathrm{mut}}$')
seaborn.boxplot(hypervols, alpha=0.74)
plt.ylabel('Hypervolume'); plt.xlabel('Mutation probabilities');
Out[69]:
We start by writing a function that helps us tabulate the results of the application of an statistical hypothesis test.
In [70]:
import itertools
import scipy.stats as stats
In [71]:
def compute_stat_matrix(data, stat_func, alpha=0.05):
'''A function that applies `stat_func` to all combinations of columns in `data`.
Returns a squared matrix with the p-values'''
p_values = pd.DataFrame(columns=data.columns, index=data.columns)
for a,b in itertools.combinations(data.columns,2):
s,p = stat_func(data[a], data[b])
p_values[a].ix[b] = p
p_values[b].ix[a] = p
return p_values
The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal.
In [72]:
stats.kruskal(*[hypervols[col] for col in hypervols.columns])
Out[72]:
We now can assert that the results are not the same but which ones are different or similar to the others the others?
In case that the null hypothesis of the Kruskal-Wallis is rejected the Conover–Inman procedure (Conover, 1999, pp. 288-290) can be applied in a pairwise manner in order to determine if the results of one algorithm were significantly better than those of the other.
Note: If you want to get an extended summary of this method check out my PhD thesis.
In [73]:
def conover_inman_procedure(data, alpha=0.05):
num_runs = len(data)
num_algos = len(data.columns)
N = num_runs*num_algos
_,p_value = stats.kruskal(*[data[col] for col in data.columns])
ranked = stats.rankdata(np.concatenate([data[col] for col in data.columns]))
ranksums = []
for i in range(num_algos):
ranksums.append(np.sum(ranked[num_runs*i:num_runs*(i+1)]))
S_sq = (np.sum(ranked**2) - N*((N+1)**2)/4)/(N-1)
right_side = stats.t.cdf(1-(alpha/2), N-num_algos) * \
math.sqrt((S_sq*((N-1-p_value)/(N-1)))*2/num_runs)
res = pd.DataFrame(columns=data.columns, index=data.columns)
for i,j in itertools.combinations(np.arange(num_algos),2):
res[res.columns[i]].ix[j] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
res[res.columns[j]].ix[i] = abs(ranksums[i] - ranksums[j]/num_runs) > right_side
return res
In [74]:
conover_inman_procedure(hypervols)
Out[74]:
We now know in what cases the difference is sufficient as to say that one result is better than the other.
Another alternative is the Friedman test.
In [75]:
hyp_transp = hypervols.transpose()
measurements = [list(hyp_transp[col]) for col in hyp_transp.columns]
stats.friedmanchisquare(*measurements)
Out[75]:
Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that two populations are the same against an alternative hypothesis, especially that a particular population tends to have larger values than the other.
It has greater efficiency than the $t$-test on non-normal distributions, such as a mixture of normal distributions, and it is nearly as efficient as the $t$-test on normal distributions.
In [76]:
raw_p_values=compute_stat_matrix(hypervols, stats.mannwhitneyu)
raw_p_values
Out[76]:
The familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.
Example: When performing a test, there is a $\alpha$ chance of making a type I error. If we make $m$ tests, then the probability of making one type I error is $m\alpha$. Therefore, if an $\alpha=0.05$ is used and 5 pairwise comparisons are made, we will have a $5\times0.05 = 0.25$ chance of making a type I error.
One of these corrections is the Šidák correction as it is less conservative than the Bonferroni correction: $$\alpha_{SID} = 1-(1-\alpha)^\frac{1}{m},$$ where $m$ is the number of tests.
In [77]:
from scipy.misc import comb
alpha=0.05
alpha_sid = 1 - (1-alpha)**(1/comb(len(hypervols.columns), 2))
print(alpha_sid)
Let's apply the corrected alpha to raw_p_values
. If we have a cell with a True
value that means that those two results are the same.
In [78]:
raw_p_values.applymap(lambda value: value <= alpha_sid)
Out[78]:
In this class/notebook we have seen some key elements:
Bear in mind that: